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Abstract 

In  this  paper ,  we  present  our  participation  in  the  Medical  Records  Track  of  TREC2012.  We 
focus  on  the  impact  of  combining  the  word  space  and  the  concept  space  in  the  information 
retrieval  process.  For  this  track ,  we  submitted  a  baseline  run  by  employing  the  In_expC2 
weighting  model  implemented  in  the  Terrier  platform,  which  achieved  fair  results  (0.304 
MAP,  0.51P@10).  Then,  we  expanded  the  documents  by  performing  automatic  text 
conceptualization  using  UMLS ®  and  the  MetaMap  software  on  medical  records.  These  textual 
and  conceptual  representations,  still  using  the  DFR  model,  led  to  precision  (0.29  MAP,  0.47 
P@10).  We  also  automatically  extended  the  topics  with  UMLS ®  concepts.  This  led  to  a  lower 
precision  (0.27  MAP,  0.46  P@10)  Lastly,  we  experimented  the  usage  of  semantic  IR 
measures  only  (0.21  MAP,  0.41  P@10).. 

Keywords:  DFR,  In_expC2,  Automatic  Expansion,  Medical  Record  Retrieval,  UMLS, 
Conceptualization,  Semantic  IR. 


1.  Introduction 

The  goal  of  medical  track  is  to  foster  research  on  providing  content-based  access  to  the 
free-text  of  electronic  medical  records.  To  achieve  this  goal,  we  propose  to  combine 
conceptualization,  document  and  query  expansion  and  the  DFR  (Divergence  from 
Randomness)[l]  matching  model.  For  these  purposes,  we  used  the  Terrier1  platform  for 
indexing,  retrieval  and  expansion,  and  MetaMap®2  for  the  conceptualization  process. 

First  of  all,  we  built  the  free-text  index  of  the  medical  records  and  applied  a  DFR 
matching  model  with  query  expansion.  Then,  we  expended  the  documents  with  the  concepts 
extracted  from  UMLS®  and  applied  a  DFR  matching  model.  Finally,  we  also  extended  the 
queries. 

The  paper  is  organized  as  follows:  Section  2  describes  our  system  architecture, 
outlining  each  component  along  the  three  runs.  Experimental  results  will  be  presented  and 
discussed  in  section  3.  Section  4  gives  a  conclusion  and  perspectives. 


2.  System  architecture 

We  proposed  three  strategies  to  match  the  user’s  query  and  the  documents..  We 

will  begin  this  section  by  explaining  each  strategy  and  by  outlining  each  component. 


'http  ://w  ww.terrier.org/ 

2http://metamap.nlm.nih.gov/ 
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Each  strategy  (numbers  1,  2,  3  —  see  Fig.  1)  represents  a  submitted  run.  In  our  first 
strategy  (1)  —  run  LSIS1  — ,  we  indexed  the  set  of  documents  by  employing  the  Terrier 
platform  and  retrieved  documents  by  using  the  DFR  model  and  performing  default  query 
expansion.  In  the  second  strategy  (2)  —  run  LSIS2  — ,  we  built  a  second  index  combining  the 
original  documents  and  their  associated  concepts  after  being  identified  by  the  MetaMap 
software.  As  for  run  LSIS1,  we  then  used  the  DFR  model  to  retrieve  the  documents  from  the 
topics.  Finally,  in  the  third  strategy  (3)  -  run  LSIS3  we  added  to  the  second  strategy  a 
query  conceptualization  phase,  i.e  we  matched  the  extended  query  (the  original  tokens  and  the 
concepts)  with  the  extended  documents.  The  aim  of  the  second  strategy  was  to  measure  how 
much  the  conceptualization  of  the  documents  only  affected  the  weights  of  the  words. 


1,2 


Fig  l.The  system’s  architecture,  #1,2  and  3  represent  the  submitted  runs  LSIS1,  2  and  3  respectively, 
DFR  is  the  IR  model  (Divergence  From  Randomness),  PSR  means  (Pseudo  Relevance  Feedback). 

2.1.  Index  Building 

We  chose  the  medical  report  as  the  indexing  unit.  We  made  the  indexing  for  the  field 
TEXT  and  kept  the  DOCNO  as  report  identification  and  VISITID  as  visit  identification 
(required  for  distinguishing  the  reports  belonged  to  the  same  visit).  We  used  the  Terrier  IR 
platform  [2]  for  indexing  by  applying  the  Porter  stemming  algorithm  [3]  with  its  standard  list 
of  stop  words.  We  applied  the  same  steps  for  the  topics. 

2.2.  Matching  model 

We  considered  that  the  maximum  score  between  the  query-topic  q  and  the  visit  records 
d  is  the  relevance  score  between  the  query  and  the  visit  V. 

RS V(V,q)  -  MaxdGKscore(<i,g)  (1) 

We  submitted  runs  performed  with  the  DFR  model  In_expC2  (Inverse  Expected 
Document  Frequency  model  with  Bernoulli  after-effect  and  normalization)  weighting  model 
[4] [5].  Then,  we  applied  query  expansion  technique  based  on  the  default  Bose-Einstein  1 
(Bol)  expansion  model. 

According  to  the  In_expC2  model,  the  relevance  score  of  a  document  d  for  a  query  q  is 
given  by: 


scor  e(d,q)  =  Steqnd  ^t/xw(t,  d) 


(2) 


2 


where  qtf  is  the  frequency  of  term  t  in  the  query  q ,  and  w (t,d)  is  the  relevance  score  of  a 
document  d  for  the  query  term  t ,  given  by: 


w  (t,  d )  =  f; 


Ft  +  1 


\ntx(tfne ) 


)  x(t/ne xlog2 


(3) 


where: 

-Ft  is  the  term  frequency  of  t  in  the  whole  collection. 

-  N  is  the  number  of  document  in  the  whole  collection. 

-  nt  is  the  document  frequency  of  t. 

-  ne  is  the  number  of  relevant  documents  containing  a  term  according  to  the  binomial 
distribution  given  by: 

ne  =  Nx(l  — (i^/0  (4) 

-  tfne  is  the  normalized  within-document  frequency  of  the  term  t  in  the  document  d.  It 
is  given  by  the  second  normalization  [4]  [5]: 

tfne  =  t/xloge  (l  +  cx  (5) 

where  c  is  a  parameter  for  normalization,  tf  is  the  within-document  frequency  of  the  term  t  in 
the  document  d,  /  is  the  document  length,  and  avg  l  is  the  average  document  length  in  the 
whole  collection. 

2.3.  Conceptualization  using  MetaMap 

We  extended  the  documents  and  the  queries  by  the  medical  concepts  extracted  from 
UMLS  ontology.  For  this  purpose  we  used  MetaMap,  a  system  developed  by  the  U.S.National 
Library  of  Medicine  [6].  The  comparisons  with  human  subjects  have  shown  that  MetaMap  is 
effective  in  concept  identification  tasks  [7].  MetaMap  first  analyses  the  input  text  and 
produces  a  ranked  list  of  possible  matching  candidate  concepts,  each  candidate  concept  has  a 
score  which  will  be  useful  for  selecting  the  appropriate  concepts.  Thus,  we  can  either  keep  the 
best  concepts  having  highest  scores  which  we  call  the  best  concept  strategy  or  we  can  keep  all 
concepts  which  we  call  the  allconcept  strategy.  For  the  experiments  described  here,  we 
employed  the  best  concept  strategy.  Fig  2  shows  an  example  of  mapping  the  original  topic 
number  137  to  UMLS  concepts. 

Mapping  text  to  concepts  aims  to  overcome  some  of  the  vocabulary  mismatch  that 
might  exist  in  medical  text  by  mapping  different  terms  to  the  related  concept. 

We  remark  in  Fig  2  that  patients  maps  the  conceptC0030705,  inflammatory  disorders 
maps  C1290884,  receiving  maps  C1514756,  TNF -inhibitor  treatment  maps  C1999216.  In 
fact,  these  concepts  do  not  represent  well  the  original  topic,  the  concept  in  SNOMED-CT 
which  represents  the  TNF -inhibitor  is  Cl  579324  ( Tumor  Necrosis  Factor  (TNF)  inhibitors ), 
but  the  concept  Cl 9992 16  mapped  by  MetaMap  represents  the  inhibitors ,  and  the  concept 
which  represents  the  treatment  is  Treating  Cl 522326.  We  found  several  examples  that 
highlight  that  preprocessing  will  be  needed  in  the  future  to  improve  the  conceptualization. 


Patients  with  inflammatory 
disorders  receiving  TNF-inhibitor 
treatments. 


C0030705  C1290884 
C1514756  C1999216 


Fig  2.  An  example  of  mapping  a  medical  document  to  UMLS  concepts. 
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2.4.  Pseudo-relevance  feedback  for  query  expansion 

The  query  expansion  (pseudo  relevance  feedback)  mechanism  we  employed  with 
Terrier,  without  conceptualization  (run  LSIS-1)  and  after  conceptualization  (runs  LSIS-2  and 
LSIS-3),  is  a  generalization  of  Rocchio's  method[8].  It  adds  the  terms  from  the  top-ranked 
documents  retrieved  to  the  query  and  reweights  the  query  terms  by  taking  into  account  the 
pseudo  relevance  set.  We  used  the  expansion  model  Bol  that  is  based  on  the  Bose-Einstein 
statistics  and  on  the  DFR  framework  (its  efficacy  is  proven  in  [2][1][9]).  The  weight  w  of  a 
term  t  in  the  top-ranked  documents  is  given  by: 

w(t)  =  tfxX log2  ly*-  +  log2(l  +  P„)  (6) 

Ml 


where  tfx is  the  frequency  of  the  query  term  in  the  top-ranked  documents,  Pn  is  given  by 
Ft/ N,  Ft  the  frequency  of  the  term  t  in  the  collection,  and  N  is  the  number  of  documents  in  the 
collection.  Then,  the  query  term  weight  qtw  after  merging  the  top-ranked  document  terms 
with  the  original  terms  is  given  by: 


qtw  = 


qtf  w(t) 

Qtfmax  limF^tfxw(t) 


—  ^max^°  §2 


1+Pn,max 

Pn,max 


T  Pn,max ) 


(7) 


where  limF^tfx  w(t)is  the  upper  bound  of  w(t)  (6),  Pn,max  is  given  by  Fmax/N,  and  Fmax  is  the 
frequency  F  of  the  term  with  the  maximum  w(t)  in  the  top-ranked  documents.  If  an  original 
query  term  does  not  appear  in  the  terms  extracted  from  the  top-ranked  documents,  its  query 
term  weight  remains  equal  to  the  original  one. 


3.  Results 

3.1  Official  TREC  2012  results 

The  results  of  our  system  (Table  1)  show  that  the  term-based  approach  LSIS1  gives  fair 
results.  It  was  expected  to  obtain  a  little  lower  precision  for  LSIS2  (conceptualization  of  the 
documents  only  adds  some  noise  to  the  word  space).  But  the  result  for  the  run  LSIS3,  where 
the  concepts  were  added  to  both  documents  and  topics,  shows  that  our  combination 
(document  and  query  expansions  with  concepts)  did  not  improve  the  precision.  Indeed,  we 
can  remark  in  Fig.  3  that  the  behavior  of  the  system  has  marginally  changed  within  the  three 
strategies  for  each  topic.  As  a  conclusion,  we  can  say  that  in  our  experiments  the  word  space 
was  good  enough  for  retrieving  the  document  with  an  appropriate  ranking.  Concepts  that  were 
added  to  this  space  through  the  conceptualization  phase  did  not  contribute  effectively  in 
improving  document  retrieval. 


Submitted  run 

MAP 

P@10 

R-prec 

bpref 

LSISl 

0.3044 

0.5064 

0.3340 

0.3517 

LSIS2 

0.2884 

0.4681 

0.3181 

0.3313 

LSIS3 

0.2690 

0.4553 

0.3065 

0.3094 

Table  1.  Performance  comparison  with  our  three  runs  (TREC  2012  topics) 


3.2  Runs  non  submitted:  concepts  only 

We  developed  two  more  approaches  for  testing  conceptualization.  The  first  approach 
(namely  “DFR-Concept”  hereafter)  employs  a  DFR  model  for  ranking  the  documents  keeping 
only  the  mapped  concepts  (all  the  original  words  were  removed).  The  second  approach 
(namely  “Semantic  IR”  hereafter)  uses  a  semantic  similarity  measure  on  the  concepts  in  order 
to  rank  the  documents. 
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Table  2  shows  the  MAP,  P@10  and  R_prec  for  topics  of  TREC  medical  track  2011  and 
2012,  and  Fig.  4  shows  a  comparison  of  the  two  approaches  in  regard  to  the  MAP  of  each 
2012  topic. 

The  MAP  for  each  topic  was  lower  in  comparison  to  the  term-based  approach  (Table 
1).  The  main  advantage  of  a  ‘semantic  measure’  is  to  take  into  account  the  amount  of 
semantic  information  that  is  shared  between  two  concepts  in  the  ontology.  This  is  not 
accomplished  by  the  DRF-concept  approach  for  which  every  concept  is  independent. 
Unfortunately  this  theoretical  advantage  did  not  produce  better  results  even  though  the  DFR 
model  is  not  necessarily  adapted  to  conceptual  distributions. 

A  semantic  similarity  measure  exploits  an  ontology  for  computing  the  similarity 
between  two  concepts.  For  computing  the  similarity  between  two  groups  of  concepts  (the 
concepts  of  a  topic  and  the  concepts  of  a  document)  we  have  to  employ  an  aggregation 
measure. 

Semantic  similarity  measures  can  be  generally  partitioned  in  four  categories:  those 
based  on  how  close  the  two  concepts  in  ontology  are  (structure-based  measures),  those  based 
on  how  much  information  the  two  concepts  share  (information  content  measures),  those  based 
on  the  properties  of  the  concepts  (feature-based  measures),  and  those  based  on  combinations 
of  the  previous  options  (hybrid  measures)  [10]. 

We  experimented  a  structure-based  measure  Leacock  &  Chodorow  [11]  which  exploits 
the  shortest  path  between  the  two  concepts  and  the  depth  of  the  ontology: 


Sim 


leacock 


(cl,  c2)  =  log  (■ 


mm£|pat/ij(cl,c2)| 


2D 


) 


(8) 


where  min  |pathi(cl,  c2)|  is  the  length  of  the  shortest  path  between  the  two  concepts  cl  and 
c2,  and  D  is  the  maximum  depth  of  the  ontology. 

We  used  an  aggregation  function  [12]  for  ranking  the  retrieved  documents  and  computing  the 
similarity  between  two  groups  of  concepts: 


Sim(gl,g2)  =  0.5x 


->ceg  l 


Maxsim(c,g2)xidf(c )  Zc  egz  Maxsim(c,gl)xidf(c ) 


->ceg  l 


idf(c) 


->CEg  2 


idf(c ) 


) 


(9) 


where  Maxsim(c,g)  is  the  maximum  similarity  between  each  concept  of  the  group  g  and  the 
concept  c  given  by  equation  (8). 

The  results  of  this  approach  (Semantic  IR  in  Table  2)  for  the  topics  of  201 1  and  2012, 
were  not  fair,  because  the  measure  we  used  exploits  the  ontology  structure  only.  These  results 
are  weak  and  we  plan  to  test  some  other  semantic  measures  that  have  given  good  results  in 
other  experiments  [13]. 


Run 

MAP 

P@10 

R-prec 

DFR-Concept  (topics  2011) 

0.2160 

0.3559 

0.2473 

DFR-Concept  (topics  2012) 

0.2103 

0.4128 

0.2651 

Semantic  IR  (topics  2011) 

0.1149 

0.2353 

0.1715 

Semantic  IR  (topics  2012) 

0.1838 

0.3362 

0.2380 

Table  2.  Comparison  between  two  concept-based  only  approaches 
(topics  2011  and  2012  —  non  official  results). 
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Fig  4.  MAP  for  ‘DFR  concept-based’  and  ‘semantic  IR  concept-based’  approaches  :  only 
concepts  are  kept  (the  original  words  are  deleted)  (topics  2012  —  non  official  results). 


4.  Conclusion 

We  have  presented  our  system  which  uses  the  Terrier  platform  for  indexing  and 
retrieving,  and  MetaMap  for  conceptualization.  We  focused  on  the  weighting  model  DFR 
In_expC2  and  measured  the  impact  of  expanding  documents  and  topics  with  concepts.  Lastly, 
we  presented  some  non-official  runs  we  experimented  by  employing  a  concept  only 
representation  of  documents  and  topics.  We  used  a  semantic  measure  that  exploits  the 
relationship  between  concepts.  Many  measures  will  be  tested  in  the  future  and  a  good 
integration  within  the  probabilistic  model  remains  to  be  found. 
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Fig  3.  MAP  for  each  topic  for  3  submitted  runs  LSIS1,2,3  (official  results) 


